Why?

Randomized quantile residual was proposed in the literature [in 1996] to circumvent the…problems in traditional residuals [for generalized linear models]. However, this approach has not gained deserved awareness and attention, partly due to the lack of extensive empirical studies to investigate its performance.

–Feng et al, 2017


Deep dive into quantile residuals for model evaluation and the most popular implementation of them, DHARMa.

State of DHARMa

  • DHARMa was first released in 2016. CRAN downloads were sparse until 2019, and began steadily increasing. It averages approximately ~500 downloads per day (source: https://hadley.shinyapps.io/cran-downloads/).

  • It has reverse dependencies with 18 R packages including glmmTMB, performance, easystats.

  • DHARMa does not have a published peer-reviewed manuscript supporting its release.

Linear Mixed Model

\[y_{ij} = x_i\beta + Z_ja + \epsilon_{ij} \]

\[\epsilon \sim N(0, \sigma^2 \mathbf{I_n}) \] \[a \sim N(0, \sigma_a^2 \mathbf{I_r}) \]

\[ \bar{y_{i.}}|a_{j} \sim 𝑁(x_i \beta + a_j, \sigma^2/r_i) \]

\[ \bar{y_{i.}} \sim 𝑁(x_i \beta, \sigma_a^2 + \sigma^2/r_i) \]

‘Standard’ Residuals

Raw \[\epsilon_i = Y_i - \hat{Y_i} \]

(internally) Studentized:

\[\epsilon_i = \frac {Y_i - \hat{Y_i}} {\hat{s}}\]

Pearson/Scaled:

\[ \epsilon_i = \frac {Y_i - \hat{Y_i}} {sd(Y_i)} \]

Standard Residuals from GLMMs Can Be Nonsensical

data("VerbAgg", package = "lme4")
m2 <- glmer(r2 ~ (Anger + Gender + btype + situ)^2 + (1|id) + (1|item), 
            family = binomial, data = VerbAgg, nAGQ=0L)
plot(m2, cex.axis = cex_axis, cex.lab = cex_lab)

Raw/Scaled Residuals Can Lack Diagnostic Capabilities for GLMMs

  • There is often no defined distribution for residuals
  • Visual patterns are difficult to interpret.
  • Over and under-dispersion are difficult to assess.
  • Goodness of fit tests are not valid outside of normal-distriubuted variables.


Different distributional assumptions and mathematical mean/variance relationships of some distributions require a different approach

The Original

Screenshot of original paper title

Quantile Residual Background

For \(y_1,...,y_n\) responses:

\[y_i \sim \mathcal{P}(\mu_i, \phi)\]

CDF: \[F(y; \mu, \phi)\]

\[r_{i,q} = \Phi^{-1}(u_i)\] \[ u_i \sim Uniform(a_i, b_i]\] \[a_i = lim_{y \rightarrow y_i} F(y; \hat{\mu_i}, \hat{\phi})\]

\[ b_i = F(y_i; \hat{\mu_i}, \hat{\phi}) \]

Quantile Residual Background

For a discontinous \(F\):

  • \(a_i = Pr(Y < y_i)\)
  • \(b_i = Pr(Y <= y_i)\)
  • (a, b]

For a continuous \(F\):

  • \(a=b\)
  • \(F(y_i; \mu_i, \phi)\) are uniformly distributed if the model is correct; and
  • the quantile residuals are:

\[ r_{i,q} = \Phi^{-1} \left\{F(y_i; \hat{\mu_i}, \hat{\phi} )\right\}\]

Method implemented in ‘statmod’

Screenshot of statmod on CRAN

This Method Has Been Around

screenshot of 2003 statmod archive


Randomization is used to produce continuously distributed residuals when the response is discrete or has a discrete component. This means that the quantile residuals will vary from one realization to another for a given data set and fitted model.


–Smythe & Dunn (1996)

DHARMa

(Diagnostics for HierArchical Regression Models) Screenshot of dharma on CRAN

The DHARMa Process

  1. Model a process using a generalized linear model with a given distribution and link function.

\[\mathbf{Y = X\beta+Za}\] \[ \mathbf{Y|a\sim \mathcal{P}(\mu, \phi)} \] linear predictor: \(\mathbf{\eta = X\beta}\)

fitted value: \(E(Y) = \mu\)

Link function: \(\eta = g(\mu)\)

The DHARMa Process

For each observation in the data set:

  1. Simulate new observations (\(n_{sim} = n_{data}\)) using fitted values as the model distributional parameters (e.g. shape, scale) as appropriate. \[ Y_i \sim \mathcal{P}(\hat{\mu_i}, \hat{\phi}) \]

  2. Calculate the quantile for the cumulative distribution function

\[ r_{i,q} = F(y_i; \hat{\mu_i}, \hat{\phi} )\]

Count variables: add a random value \(\sim \mathcal{U}(a, b)\)

\(a =\) empirical probability simulated values are less than observed

\(b =\) empirical probability simulated values are less than or equal to observed

The end

Expectations of the Quantile Residuals

\[ r_{i,q} \sim Uniform(0,1) \]

\(r_{i,q} = 0\): everything is larger

\(r_{i,q} = 1\): everything is smaller

\(r_{i,q} = 0.5\): right in the middle

DHARMa runs 250 simulations per observation by default, they recommend up to 1000

Poisson Example

\[ (\hat{Y_i} = \lambda = 5,; \quad Y_i = 7 )\]

Quantile Residuals for 1 Observation

Poisson GLMM Example

Create a data set with a random effect (“group”, 10 levels), a covariate (“Environment1”) and a Poisson-distributed response variable (“observedResponse”):

testData = createData(sampleSize = 500)

Analyze correctly and incorrectly:

rightModel <- glmer(observedResponse ~ Environment1 + (1|group) , 
                     family = "poisson", data = testData)

wrongModel <- lmer(observedResponse ~ Environment1 + (1|group) , 
                     data = testData)

‘Standard’ Residuals

Quantile Residuals

Correctly Specified Model

What’s in the Plots?

  1. x-axis: rank-transformed predictions, rank(preds)/max(rank) (ties method = “average”)
  2. Kolmogorov-Smirnov test for uniformity (ks.test()) against a uniform distribution, (0, 1)
  3. Dispersion test: compares the variance in observations to the variance of the simulations; bootstrap approach: the p-val is the frequency a data set of observed value exceeds or is below an expectation (two-sided tests both)
  4. Outliers and outlier test: outliers are residual values of 0 or 1. Outlier test uses the binomial test (binom.test(), evaluating if the number of outliers is appropriate for the data set size (expectation: n = 1/(nSim +1).
  5. Quantile tests: fits splines (qgam::qgam()) at 0.25, 0.5 and 0.75 quantiles and tests for deviations from the expectation of a flat line, using a Benjamini-Hochberg adjustment.

Quantile Residuals

Incorrectly Specified Model

Distributions of Quantile Residuals

Quantile Function of the Standard Normal Distribution

Notes on DHARMa Implementation

  • It depends on the simulation functions build into GLMM package
  • Supported packages: lm(), glm(), lme4, mgcv, glmmTMB, spaMM, GLMMadaptive, brms, & more
  • Commonly, only the last stochastic level (e.g. Poisson) is simulated, conditional on the fitted random effects – basically an omnibus test
  • Residuals can be simulated for individuals predictors and tests can be conducted for individual factors (highly recommended!)
  • There are other tests for spatial and temporal autocorrelation, zero-inflation, over/under dispersion tests

lme4::glmer() Marginal Model

By default, simulations are conducted on a marginal model:

rightModel <- glmer(observedResponse ~ Environment1 + (1|group) , 
                     family = "poisson", data = testData)
sr <- simulateResiduals(rightModel, re.form = NA, plot = TRUE)

lme4::glmer() Conditional Model

Use re.form argument in glmer() to resimulate the data.

res2 <- simulateResiduals(fittedModel = rightModel, re.form = NULL, plot = TRUE)

glmmTMB() Conditional Model

  • By default, simulations are conducted on the conditional model:
data("sleepstudy", package = "lme4")
m1 <- glmmTMB(Reaction ~ Days  + (Days|Subject), sleepstudy)
m1_res <- simulateResiduals(m1, plot = TRUE)

glmmTMB() Marginal Model

  • Reset the simulation conditions for a glmmTMB object.
  • The implementation is a very recent addition and lacks clear documention .
m2 <- m1; set_simcodes(m2$obj, val = "fix") # fixed effects only
m2_res <- simulateResiduals(m1, plot = TRUE)

Best Practices

  • Feng et al (2017) demonstrated for many scenarios (misspecified models for gamma, neg bin, Poisson and zero-inflated models) that quantile residuals are a better diagnostic tool for checking model distribution, model parameters, over/under dispersion, and overall lack of fit of GLMMs than ‘standard’ residuals.
  • Fully conditional simulations are recommended for an omnibus diagnostic; if the model is correct, this should no matter, but the conditional simulation is more sensitive.
  • We do not know their overall sensitivity to modest model misspecifications.
  • Results from quantiles tests for uniformity should be treated with caution - no pattern proves a model is correct; likewise, non-random patterns are not always a deal-breaker.
  • Outliers identified are warnings of possible outliers and lack quantitative information.

Final Thoughts

  • There is an overall lack of information on using quantile residuals in a mixed model context and the application of conditional versus marginal quantile residuals.
  • RTFM: read the DHARMa vignette and when needed, function documentation. Not all all R code documentation is well written and helpful, but this package is very helpful.
  • The quantile residuals described herein do not apply to multinomial models. An extension of quantile residuals for these models has been developed, but it lacks an easy implememtation.

Sources

  • Bates DM, Maechler M, Bolker BM and S Walker (2015). “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

  • Brooks ME, Kristensen K, van Benthem KJ, Magnusson A, Berg CW, Nielsen A, Skaug HJ, Maechler M and BM Bolker (2017). “glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling.” The R Journal, 9(2), 378-400. doi: 10.32614/RJ-2017-066.

  • Dunn KP, and GK Smyth (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 1-10.

  • Feng et al (2017). Randomized quantile residuals: an omnibus model diagnostic tool with unified reference distribution. arXiv https://doi.org/10.48550/arXiv.1708.08527

  • Gelman A and J Hill (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, New York.

Sources

  • Gerber EAE and BA Craig (2024) Residuals and diagnostics for multinomial regression models. Statistical Analysis and Data Mining: An ASA Data Science Journal 17:e11645. https://doi.org/10.1002/sam.11645
  • Liu D, Zhang H (2018) Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. J Am Stat Assoc 113:845–854. https://doi.org/10.1080/01621459.2017.1292915
  • Pinheiro JC and DM Bates (2000). Mixed-Effects Models in S and S-PLUS. Springer Verlag, New York.

  • Stroup WW, Ptukhina M and J Garai (2024). Generalize Linear Mixed Models: Modern Concepts, Methods and Applications. CRC Press, Boca Raton.



(G)LMMs are hard - harder than you may think based on what you may have learned in your second statistics class….


–Ben Bolker, GLMM FAQ